Memory-Access-Driven Context Partitioning for Window-Based Image Processing on Heterogeneous Multicore Processors
نویسندگان
چکیده
Accelerator cores in low-power heterogeneous processors have on-chip local memories to enable parallel data access. The memory capacities of the local memories are very small. Therefore, the data should be transferred from the global memory to the local memories many times. These data transfers greatly increase the total processing time. Memory allocation technique to increase the data sharing is a good solution to this problem. However, when using reconfigurable cores, the data must be shared among multiple contexts. However, conventional context partitioning methods only consider how to reuse limited hardware resources in different time slots. They do not consider the data sharing. This paper proposes a context partitioning method to share both the hardware resources and the local memory data. According to the experimental results, the proposed method reduces the processing time by more than 87% compared to conventional context partitioning techniques. key words: memory allocation, partitioning, reconfigurable processors
منابع مشابه
Optimization of Data-Parallel Scientific Applications on Highly Heterogeneous Modern HPC Platforms
Over the past decade, the design of microprocessors has been shifting to a new model where the microprocessor has multiple homogeneous processing units, aka cores, as a result of heat dissipation and energy consumption issues. Meanwhile, the demand for heterogeneity increases in computing systems due to the need for high performance computing in recent years. The current trend in gaining high c...
متن کاملData-Transfer-Aware Memory Allocation for Dynamically Reconfigurable Accelerators in Heterogeneous Multicore Processors
Accelerator cores in low-power heterogeneous multicore processors have multiple memory modules to enable parallel data access. Recent low-power processors contain address generation units (AGUs) for fast address generation. To reduce the core-area, small functional units such as adders and counters are used in AGUs. Such small functional units make it difficult to implement complex addressing p...
متن کاملA Daptive Block Pinning Based : D Ynamic C Ache Partitioning for M Ulti - Core Architectures
This paper is aimed at exploring the various techniques currently used for partitioning last level (L2/L3) caches in multicore architectures, identifying their strengths and weaknesses and thereby proposing a novel partitioning scheme known as Adaptive Block Pinning which would result in a better utilization of the cache resources in CMPs. The widening speed gap between processors and memory al...
متن کاملFast Transactions for Multicore In - Memory Databases by Stephen Lyle
Though modern multicore machines have sufficient RAM and processors to manage very large in-memory databases, it is not clear what the best strategy for dividing work among cores is. Should each core handle a data partition, avoiding the overhead of concurrency control for most transactions (at the cost of increasing it for crosspartition transactions)? Or should cores access a shared data stru...
متن کاملAutomatic Analysis of Scratch-Pad Memory Code for Heterogeneous Multicore Processors
Modern multicore processors, such as the Cell Broadband Engine, achieve high performance by equipping accelerator cores with small “scratchpad” memories. The price for increased performance is higher programming complexity – the programmer must manually orchestrate data movement using direct memory access (DMA) operations. Programming using asynchronous DMAs is error-prone, and DMA races can le...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEICE Transactions
دوره 95-D شماره
صفحات -
تاریخ انتشار 2012